Derealization transition for the Google matrix 



o 
o 

(N 



o 

o 



> 

(N 

in 

cn 
o 

On 
O 



X 



Olivier Giraud, 1,2 Bcrtrand Georgeot, 1,2 and Dima L. Shepelyansky 1,2 

1 Universite de Toulouse, UPS, Laboratoire de Physique Theorique (IRSAMC), F-31062 Toulouse, France 
2 CNRS, LPT (IRSAMC), F-31062 Toulouse, France 
(Dated: March 30, 2009) 

We study the localization properties of eigenvectors of the Google matrix, generated both from 
the World Wide Web and from the Albert-Barabasi model of networks. We establish the emergence 
of a derealization phase for the PageRank vector when network parameters are changed. In the 
phase of localized PageRank, a derealization takes place in the complex plane of eigenvalues of 
the matrix, leading to delocalized relaxation modes. We argue that the efficiency of information 
retrieval by Google-type search is strongly affected in the phase of delocalized PageRank. 

PACS numbers: 89.20.Hh, 89.75.Hc, 05.40.Fb, 72.15.Rn 



The World Wide Web (WWW) is an enormously large 
network with about 10 11 webpages all over the world. In- 
formation retrieval in such a huge database is therefore 
a formidable task. An efficient method to search this 
database, known as the PageRank Algorithm (PRA), was 
put forward by Brin and Page [l[ and formed the basis of 
the Google search engine, by far the most popular one. 
The PRA is based on the construction of the Google ma- 
trix G which sums up the network structure in a tractable 
way and can be written as (see e.g. Q for details) 



G = aS + (1 - a)E/7V. 



(1) 



The matrix S is constructed from the adjacency matrix 
of the network. For a directed network of N nodes, the 
N x N adjacency matrix A is defined by Ay = 1 if there 
is a link from node j to node i, and Ay = otherwise. 
For networks with undirected links, A is a real symmetric 
matrix. However, the WWW corresponds to a network 
with directed links and here A is not symmetric. Matrix 
Sij is built from A by normalizing each nonzero column 
through Sij — Aij / J2k Akj and replacing by 1/JV the 
elements of columns with only zero elements. The ma- 
trix S can be viewed as the mathematical description of 
a surfer on the network. At each iteration he leaves a 
node by randomly choosing an outgoing link with equal 
probability, and in the absence of such links he goes to 
an arbitrary node at random. The Google matrix G de- 
fined by Eq.(H]) (with matrix E such that all £!y = 1) 
can be interpreted as a modification of S where with fi- 
nite probability 1 — a the surfer might jump to another 
node at random. Usually the PRA uses a = 0.85 and we 
concentrate our studies on this case. 

The matrix G has only one maximal eigenvalue A = 1 . 
The corresponding PageRank eigenvector with compo- 
nents pj gives the stationary distribution of the random 
surfer over the network. All pj are positive real numbers 
normalized by J2 Pj = 1 ■ AH nodes in the WWW can be 
ordered by decreasing pj values and thus this PageRank 
vector is of primary importance for ordering of websites 
and information retrieval. The vector can be found by 
iterative applications of G on an initial random vector. 



This PRA works efficiently due to the relatively small 
average number of links in the WWW. The WWW is 
indeed described by a very sparse adjacency matrix A, 
with only about ten nonzero entries per column. 

Numerical studies of the PageRank vector for large 
subsets of the WWW have shown that it is satisfacto- 
rily described by an algebraic decay pj ~ l/j' 3 where j is 
the ordered index, and thus the number of nodes N n with 
PageRank p scales as N n ~ 1 /p" with numerical values 
v = 1 + 1//3 w 2.1 and /3 w 0.9 This implies that the 
PageRank vector is not ergodic, displaying certain local- 
ization properties over specific sites of the network. The 
localization properties of eigenvectors of real symmetric 
matrices describing various complex networks have been 
studied recently. For systems of small-world type it was 
shown that eigenvectors display a transition from local- 
ized to delocalized states when the density of long-range 
links is changed [H, 0|. Such derealization transition 
has certain similarities with the Anderson transition for 
waves in systems with disorder |(|. More specific stud- 
ies were performed for the symmetric adjacency matrix 
of the Internet network, showing that the localization of 
eigenvectors strongly depends on the eigenvalue location 
in the spectrum, and allows to identify isolated communi- 
ties 0- The global localization properties averaged over 
the spectrum were also recently considered in Q for var- 
ious undirected networks. The studies above were per- 
formed for symmetric adjacency matrices of undirected 
networks, characterized by real eigenvalues. In contrast, 
the Google matrix is constructed on the basis of directed 
links, and thus its spectrum is generally complex. We 
note that the case of complex spectra in quantum me- 
chanics was studied in relation to poles of scattering prob- 
lems (see e.g. Q) but it remains less explored than the 
case of real spectra. 

In this Letter, we study the localization properties of 
the Google matrix G for models of realistic directed net- 
works and actual subsets of the WWW. We characterize 
the properties of right eigenstates ipi (Gipi = \iipi) as a 
function of the complex eigenvalue A. Special emphasis 
is given to the properties of the PageRank vector, which 
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is of great importance for the Google search. Our find- 
ings show that eigcnstates with complex A are generally 
delocalized over the whole network. At the same time, 
the PageRank vector may be localized or delocalized de- 
pending on the properties of the network. Such dereal- 
ization may seriously affect the efficiency of the ranking 
through the PRA. We note that the PRA has recently 
found new types of applications e.g. for academic rank- 
ing from citation networks (Iol | . It is rather probable that 
the PRA will find broad application for classification in 



various types of complex networks ll| and hence, the 
understanding of global properties of the Google matrix 
becomes very important. 
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FIG. 1: (Color online) Distribution of eigenvalues Xi of Google 
matrices in the complex plane. Color is proportional to the 
IPR £ of the associated eigenvector ipi. Top panel: AB model 
with q = 0.1 for N = 2 14 , N r = 5 random realizations, £ 
varies from £ = 32 (blue/black) to £ = 1656 (red/grey); 
middle panel: same with q = 0.7, £ varies from £ = 1169 
(red/grey) to £ = 3584 (purple/dark grey); bottom panel: 
data for a University network (Liverpool J. Moores Univ. - 
LJMU) with N = 13578 and iV r = 5 (see text), £ varies from 
£ = 7 (blue/black) to £ = 1177 (red/grey). 

To generate Google matrices G we use data from real 
subsets of the WWW, namely University networks taken 



from [12]. In addition, we generate networks with di- 
rected links using the Albcrt-Barabasi (AB) procedure 
[l3| to construct the associated G matrix. AB networks 
are built by an iterative process. Starting from m nodes, 
at each step m links are added to the existing network 
with probability p, or m links are rewired with probabil- 
ity q, or a new node with m links is added with proba- 
bility 1 — p — q. In each case the end node of new links is 
chosen with preferential attachment, i.e. with probabil- 
ity (ki + + 1) where hi is the total number of 
incoming and outgoing links of node i. This mechanism 
generates directed networks having the small-world and 
scale-free properties, depending on the values of p and 
q. The results we display are averaged over N r random 
realizations of the network to improve the statistics. In 
our studies we chose m = 5, p = 0.2 and two values 
of q corresponding to scale-free (q = 0.1) and exponen- 
tial (a = 0.7) regimes of link distributions (see Fig. 1 
in 1 1 31 ] for undirected networks). For our directed net- 
works at q = 0.1, we find properties close to the be- 
havior for the WWW with the cumulative distribution 
of ingoing links showing algebraic decay P" L (k) ~ 1/k 
and average connectivity (k) w 6.4. For q = 0.7 we 
find P c m (fc) ~ exp(-0.03fc) and (k) 15. For outgoing 
links, the numerical data are compatible with an expo- 
nential decay in both cases with P° ut (k) ~ exp(— 0.6fc) 
for q = 0.1 and P° ut {k) ~ exp(-O.lfc) for q = 0.7. We 
checked that small variations of parameters m,p,q near 
the chosen values do not qualitatively affect the proper- 
ties of G matrix. 

To characterize localization properties of eigenvectors 
ipi, we use the Inverse Participation Ratio (IPR) defined 
by £ = (E,l^(j)| 2 ) 2 /E,l^(j)| 4 - It gives the effee- 
tive number of nodes on which an eigenstate is local- 
ized. In Fig. [T] wc show the distribution of eigenvalues 
together with the IPR for the AB model and the WWW. 
In the latter case, to improve the statistics we random- 
ize the links, keeping fixed the number of links at any 
given node as proposed in [lj] . In all cases the spectrum 
consists of an isolated eigenvalue A = 1 together with an 
approximately circular distribution centered at A = (a 
significant fraction of about 30-50% states has A = 0). 
In all three cases there are circular rings of states with 
high IPR indicating that in this region the states become 
delocalized in the limit of large matrix sizes. The delo- 
calized domain is largest for AB model at q = 0.7, where 
almost all states have high IPR, including the PageRank 
vector. By contrast, at q = 0.1 the PageRank has small 
IPR while large IPR appear only in a ring centered at 
A = 0. We observe a similar behavior for the WWW 
data where the ring of delocalized states is narrower and 
the PageRank has even smaller IPR. 

In Figs. [2][3] we study the dependence on system size 
N. We computed the normalized density of states W(-y) 
(Jg 00 W(j)dj — 1) where 7 = — 2 In | A is the relaxation 
rate to the equilibrium PageRank state. For AB model 
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from WWW show a similar behavior of IPR for fixed 
matrix size N. 



FIG. 2: (Color online) Normalized density of states W (top 
panel) and IPR (bottom panel) as a function of 7. Data for 
AB model with q = 0.1 are shown by full curves with from 
bottom to top N = 2 10 (N r = 100) (black), 2 11 (N r = 50) 
(red), 2 12 (7V r = 20) (green), 2 13 (N r = 10) (blue), 2 14 (N r = 5) 
(violet). Symbols give the PageRank value of £ in the same 
order: circle, square, diamond, triangle down and triangle up. 
All curves coincide on the top panel. Dashed curves show the 
data from the WWW (L JMU network, parameters of Fig. [1} • 



in both cases the density W(i) is independent of sys- 
tem size, showing that we have reached the asymptotic 
regime of large networks. The characteristic features of 
the density are the appearance of a gap between 7 = 
and 7 = 7 C f=a 2 — 3, followed by a sharp increase with 
a maximum around 7 w 3 — 4 and a slow decrease for 
larger 7. The three models have a similar structure of 
W(j), with 7 C being not very sensitive to the value of a. 
We note that the presence of a in Eq. (JTJ) ensures that 
7c > la — 2| In a I [2J. For a = 0.85 this gives 7 Q ~ 0.33, 
that is significantly smaller than the numerical value of 
7c- This means that all three models have an intrinsic 
gap that explains the stability of 7 C to variations of a. 
It is known that for WWW networks usually 7 C = j a - 
Indeed, we found that for University networks taken by 
us from [l2T | most often this relation was approximately 
satisfied (including for LJMU). However, randomization 
of links following the procedure of 14 1 generally increases 
the size of the gap (see Fig. [TJ) . In order to test the ef- 
fect of a smaller gap on our results, we also considered a 
modification of the AB model where nodes are labeled by 
an additional "color" index, which leads to appearance 
of additional eigenvalues in the gap. This model gives 
qualitatively similar results to the models presented here 
and will be discussed elsewhere. 

While in Figs. 03] W (7) is not sensitive to matrix size, 
the IPR clearly grows with N for 7 > 7<j, where jd can 
be viewed as a derealization edge in 7. For AB model 
at q = 0.7, 7<j = since even the PageRank IPR grows 
with N. By contrast, for q — 0.1, the PageRank stays 
constant and 7^ is close to but larger than 7 C ~ 2. Data 




FIG. 3: (Color online) Same as in Fig. [5] for AB model at 
q = 0.7. 

A detailed analysis of dependence of IPR on N is 
shown in Fig. [4] for PageRank and bulk states with 
7 > 7c. For bulk states we find that IPR grows with 
N as £ ~ with n w 0.9 (AB model) and n w 0.5 
(WWW data). WWW data in Fig. [4] are taken from ac- 
tual links of various University networks without any ran- 
domization, which explains a stronger dispersion of data 
(largest not randomized case N = 13578 corresponds to 
the network LJMU used in Figs. Q][2]). The data defi- 
nitely show that derealization takes place in the bulk 
states. By contrast, the PageRank remains localized for 
WWW data (/i = 0.01 < 1) and for AB model at q = 0.1 
(/j = 0.1 <C 1), while for q = 0.7 the PageRank is clearly 
dclocalizcd (fi — 0.8). 

The distribution of the eigenvector components is 
shown in Fig. [5] for AB model. For q = 0.1 the PageRank 
is only slightly modified when N is increased by a fac- 
tor of 32 showing a decay ipi(j) ~ j^ 13 with fitted value 
/3 = 0.8, close to the WWW value (3 = 0.9 Q. The cu- 
mulative PageRank distribution P c (Pj) displayed in the 
inset also shows a good agreement with WWW data. By 
contrast, for q = 0.7, the PageRank shows a flat distribu- 
tion over a number of nodes which increases with system 
size, corresponding to a derealization regime. The states 
in the bulk are delocalized for both values of q. 

The obtained results show that localization properties 
of the PageRank vector depend on the type of networks. 
Even rather similar networks described by the same AB 
model with just one parameter changed show two qualita- 
tively different behaviors. In one case, which is closer to 
scale- free networks, the localized PageRank is distributed 
essentially on a finite number of nodes (finite IPR) while 
in the other case, closer to small-world type, the delocal- 
ized PageRank is spread over a number of nodes which 
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FIG. 4: (Color online) Dependence of £ on matrix size TV for 
AB model at q — 0.1 (triangles), q = 0.7 (circles), and for 
WWW data without randomization (squares). Full symbols 
are for PageRank £ values, empty symbols are for eigenvectors 
with 3 < 7 < 4 (AB model) or for the 10 eigenvectors with 
highest £ and 7 < 10 (WWW data). For AB model TV r is as 
in Fig. [2] and TV r = 5 for TV > 2 14 (statistical error bars are 
smaller than symbol size). Dotted blue lines give linear fits of 
WWW data, with slopes respectively 0.01 and 0.53. Upper 
dashed line indicates the slope 1. Logarithms are decimal. 



FIG. 5: (Color online) Dependence of eigenvectors tpi(j) of 
AB model on index j ordered in decreasing PageRank values 
Pj (with normalisation ^TJ. |i/>i(i)| 2 = 1 and ^2jPj = 1)- Full 
smooth curves are PageRank vectors for TV — 2 14 , dashed 
smooth curves for TV = 2 19 . Non-smooth curves are eigenvec- 
tors (TV = 2 14 ) within 3 < 7 < 4 with |*;(j)| 2 averaged in this 
interval. States are averaged over N r = 5 random networks. 
Black is for q = 0.1, red/grey for q — 0.7. Inset: cumulative 
distribution P c {pj) normalized by P c (0) = TV for AB model 
(TV = 2 18 and TV r = 5) at q = 0.1 (full black) and q = 0.7 
(dashed red/grey), and for LJMU non-randomized data (full 
red/grey). Dashed straight line indicates slope 1 — v = — 1. 
Logarithms are decimal. 



grows indefinitely with system size. The transition be- 
tween the two regimes can be viewed as a derealization 
transition in the Google matrix. Our studies show that 
actual WWW networks are located in the localized phase. 
The transition to the delocalized phase can drastically af- 
fect the efficiency of the Google search. Indeed, in the 
delocalized phase the PRA still efficiently converges to 
a well-defined PageRank vector, which is however homo- 
geneously spread practically over the whole network. In 
such a situation the classification of nodes by PageRank 
values remains possible but gives almost no significant 
information. We note that this derealization transition 
can take place even in presence of a large gap in the spec- 
trum of the Google matrix. The above transition takes 
place for the PageRank when changing parameters of the 
network. For fixed parameters, we also observe a dereal- 
ization transition in the complex plane of eigenvalues A. 
This means that the modes which describe relaxation to 
the PageRank are generally delocalized over the whole 
network for a broad range of relaxation rates 7. This 
transition is reminiscent of the Anderson transition near 
the mobility edge in energy eigenvalues. Further stud- 
ies are required in order to fully understand the physical 
origins of these transitions and their dependence on the 
characteristics of the networks. 
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